DACS Dewey index-based Arabic Document Categorization System
نویسندگان
چکیده
منابع مشابه
Automatic Arabic Document Categorization Based On The Naive Bayes Algorithm
This paper deals with automatic classification of Arabic web documents. Such a classification is very useful for affording directory search functionality, which has been used by many web portals and search engines to cope with an ever-increasing number of documents on the web. In this paper, Naive Bayes (NB) which is a statistical machine learning algorithm, is used to classify non-vocalized Ar...
متن کاملPhrase-Based Document Categorization
(Chapter in Springer book ”Current Challenges in Patent Information Retrieval”, to appear in May 2011) This paper takes a fresh look at an old idea in Information Retrieval: the use of linguistically extracted phrases as terms in the automatic categorization of documents, and in particular the pre-classification of patent applications. In Information Retrieval, until now there was found little ...
متن کاملThe Effect of Preprocessing on Arabic Document Categorization
Preprocessing is one of the main components in a conventional document categorization (DC) framework. This paper aims to highlight the effect of preprocessing tasks on the efficiency of the Arabic DC system. In this study, three classification techniques are used, namely, naive Bayes (NB), k-nearest neighbor (KNN), and support vector machine (SVM). Experimental analysis on Arabic datasets revea...
متن کاملAn Intelligent System for Arabic Text Categorization
Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. In this paper, an intelligent Arabic text categorization system is presented. Machine learning algorithms are used in this system. Many algorithms for stemming and feature selection are tried. Moreover, the document is represented using several term weighting ...
متن کاملArabic Text Categorization
In this paper, we compare the performance of three classifiers for Arabic text categorization. In particular, the naïve Bayes, k-nearest-neighbors (knn), and distance-based classifiers were used. Unclassified documents were preprocessed by removing punctuation marks and stopwords. Each document is then represented as a vector of words (or of words and their frequencies as in the case of the naï...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Applications
سال: 2012
ISSN: 0975-8887
DOI: 10.5120/7500-0634